Jointly Optimizing Activation Coefficients of Convolutive NMF Using DNN for Speech Separation
نویسندگان
چکیده
Convolutive non-negative matrix factorization (CNMF) and deep neural networks (DNN) are two efficient methods for monaural speech separation. Conventional DNN focuses on building the non-linear relationship between mixture and target speech. However, it ignores the prominent structure of the target speech. Conventional CNMF model concentrates on capturing prominent harmonic structures and temporal continuities of speech but it ignores the non-linear relationship between the mixture and target. Taking these two aspects into consideration at the same time may result in better performance. In this paper, we propose a joint optimization of DNN models with an extra CNMF layer for speech separation task. We also utilize an extra masking layer on the proposed model to constrain the speech reconstruction. Moreover, a discriminative training criterion is proposed to further enhance the performance of the separation. Experimental results show that the proposed model has significant improvement in PESQ, SAR, SIR and SDR compared with conventional methods.
منابع مشابه
Discovering speech phones using convolutive non-negative matrix factorisation with a sparseness constraint
Discovering a representation that allows auditory data to be parsimoniously represented is useful for many machine learning and signal processing tasks. Such a representation can be constructed by Non-negative Matrix Factorisation (NMF), a method for finding parts-based representations of non-negative data. Here, we present an extension to convolutive NMF that includes a sparseness constraint, ...
متن کاملDiscovering Convolutive Speech Phones using Sparseness and Non-Negativity Constraints
Discovering a representation that allows auditory data to be parsimoniously represented is useful for many machine learning and signal processing tasks. Such a representation can be constructed by Nonnegative Matrix Factorisation (NMF), which is a method for finding parts-based representations of non-negative data. Here, we present an extension to convolutive NMF that includes a sparseness cons...
متن کاملDiscovering Convolutive Speech Phones Using Sparseness and Non-negativity
Abstract Discovering a representation that allows auditory data to be parsimoniously represented is useful for many machine learning and signal processing tasks. Such a representation can be constructed by Non-negative Matrix Factorisation (NMF). Here, we present a convolutive NMF algorithm that includes a sparseness constraint on the activations and has multiplicative updates. In combination w...
متن کاملSparse NMF – half-baked or well done?
Non-negative matrix factorization (NMF) has been a popular method for modeling audio signals, in particular for single-channel source separation. An important factor in the success of NMF-based algorithms is the “quality” of the basis functions that are obtained from training data. In order to model rich signals such as speech or wide ranges of non-stationary noises, NMF typically requires usin...
متن کاملMultichannel nonnegative matrix factorization in convolutive mixtures for audio source separation Factorisation en matrices à coefficients positifs de données multicanal convolutives pour la séparation de sources audio
We consider inference in a general data-driven object-based model of multichannel audio data, assumed generated as a possibly underdetermined convolutive mixture of source signals. We work in the Short-Time Fourier Transform (STFT) domain, where convolution is routinely approximated as linear instantaneous mixing in each frequency band. Each source STFT is given a model inspired from nonnegativ...
متن کامل